1 Introduction

GitHub

Music has an incredibly fascinating effect on people. Humans are one of the few, if not the only, animals that can naturally feel rhythms. When music is played, it only takes moments for people to clap along to the beat of the song. Specific genres and styles of music have defined generations and cultures time and time again. It’s only natural for music to be so powerful for people.

This notion has picked up much attention in the media. In fact, ABC Science released a video a few years ago ( watch it here ) discussing the power of music on the brain, specifically for people with dementia and Parkinson’s. In the first part of the video, researchers showed how playing music reminiscent of one’s past can trigger memories, even for someone who has forgotten them because of dementia. They did so by curating specially tailored playlists reminiscent of the pasts of certain residents at a nursing home. When the dementia patients listened to these playlists, many of them had a sudden shift in mood. Their family members were so surprised to see them “come back to life,” in a sense. These residents would talk about their past and be more open and happy than they had been in months, very different from behavior typically displayed by people with dementia.

Another segment of the video showed how music can help with diseases that affect the motor system, like Parkinson’s. At a human movement lab, a professor has spent years analyzing and trying to solve these movement disorders. One thing she tried was playing music, and it’s incredible to see the change that overcomes her patients. John, one of her patients, suffers from Parkinson’s. The video shows how the debilitating disease limits his natural motor skills and prevents him from having voluntary movements. However, after playing some classical music, John is able to not only walk, but even dance with a partner.

The last part of the video follows a man named Shane who suffered severe brain damage from a bike accident. Following the accident, he could barely speak, move, and recall anything. Later, he became part of an experiment to see if music could invoke memories for people who had severe memory loss from brain injuries. The results showed that he did just as well as people with normal, healthy brains, despite his bike accident. For example, Shane had trouble recalling a memory from grade school if he was just asked to, but had no issues doing so if a song from grade school was being played.

This video illustrates not only the power in music, but its prevalence in our natural being. It is much more than a form of entertainment, it is part of a complete life. The rhythm and tunes of songs – whether from traditional instruments, natural sounds, or digital playlists – bury themselves in our brains and become an integral part of human life.

Additionally, music is not immune to the various societal trends that can morph and change over time. Since it is a form of art and expression, it generally reflects on what major feelings and emotions are being spread during a given time. The prevalence of music in our lives combined with its artistic essence makes it a strong vantage point when looking at how society expresses itself. For example, there are times in the late 1940’s and 1950’s where songs tend to be filled with sad or melancholic lyrics, making for more of a “low-mood” song. This has much to do with the major wars at the time, namely World War II and the subsequent Cold War.

Other trends in music, and art in general, can illuminate more information. Seeing how artists express their views of the world provides a look into life during that given time period. Another advantage of music is that the popularity of a song or genre can be tracked. This is important because it not only gives insight on what songs are being produced in a given time, but also to what extent they are consumed. So, it is possible to look at both the expressed emotions in music and how much it resonates with listeners of that time.

Changes in music are not strictly limited to different life periods, but it’s also present in different phases of one’s life. For example, there is a popular practice of pregnant mothers playing classical music for their babies in hopes of making them more intelligent. In another phases of life, teens may be warned to stay away from certain genres of music because it will “rot their brains” or act as a bad influence. In either case, music enters our lives early and becomes a part of it ever since. Many young musical prodigies are discovered because they begin reacting to music even before they are able to speak or walk.

Our plan for this study relies on a comprehensive data set supplied by the Spotify API. This data encompasses many interesting variables, and there is much to experiment with. It provides a fertile ground for exploration of how music affects humans. There are several variables in the data set, including but not limited to year released, popularity, acousticness, etc. Two variables specifically are eye-catching: valence, a measure between 0 and 1 of the happiness of a track, and explicitness, a binary variable indicating whether a track contains explicit content (1) or not (0). There is a perceived trend that current music (2010-2020) is becoming more and more explicit and despairing. In fact, this can be seen in Figure 1.1; there appears to be a rise in valence after 1950, before dropping in valence a little after 1975.

Scatterplot of Spotify's valence score by year

Figure 1.1: Scatterplot of Spotify’s valence score by year

This study aims to take a closer look at correlations between the music data set and other data sets in mental health/happiness and crime. Is it possible that the prevalence of explicit music is rising with crime rates? Is it possible that the decrease in valence score is correlated with an increase in mental health cases? These questions will be discussed in the subsequent sections.

2 Data

The data sets used in this project required lots of work to clean and prepare for visualization and analysis. The data cleaning and code is outlined in the ATABAS_HUANG_dataCleaning.Rmd file, with short descriptions attached for each cleaning procedure.

Music:

The music data set is made available through the Spotify API. Spotify is an audio streaming service that was launched in 2008 and is now one of the most popular streaming platforms for music and podcasts. Therefore, the data set is fairly comprehensive and contains lots of information due to Spotify’s status as one of the most popular sites for music consumption.

The data encompasses many interesting variables. There are descriptors such as artist or band name, year of song, genre, and key, but there are many other metrics. The two most note-worthy ones for this project include valence and explicitness. Spotify describes the metric for the valence (happiness) of a song as follows: “A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry)”. Explicitness is described as follows: “Whether or not the episode has explicit content (true = yes it does; false = no it does not OR unknown)”.

Crime:

The United Nations Office on Drugs and Crime (UNODC) has many comprehensive data sets that pertain to crime of all different shapes and forms. The organization, as a subset of the United Nations, works to cover pressing concerns related to crime in general, but they focus on tackling and/or preventing crime. The data sets present on their website are grouped by type of crime with further subsets for more details. For example, there are distinctions made between sexual violence, homicide, and assault. Further, assault can be analyzed more specifically through the different mechanisms (firearm, sharp object, serious, etc.)

With a data set of this comprehensive nature, it is possible to make different comparisons by extracting the parts of the data that are of interest. In this case, it was of interest to study the relationship between trends in crime and trends in explicit content in music. There are many studies that have shown types of music to have a positive impact (classical music, joyous music). However, there are also claims that media in the form of video games, TV, and/or music can have an equally negative impact depending on their contents. Along these lines, it may be interesting to explore any relationships between explicit music and crime.

The limitations to this data set originate more from the collection process. The data is collected from the annual United Nations crime trends survey. This is conducted by the UNODC and takes responses from the government officials in a given country. This method not only relies on countries reporting to the U.N., but also relies on local authorities within countries and then reporting to the federal level. There may be differences in reporting crime across different countries, because each country may have different laws regarding different crimes. Even states or counties within a given country can have different guidelines for recording a crime. This should be kept in consideration when drawing conclusions.

Since all of these data sets were provided in separate files, it was necessary to read the excel files separately and then follow with a merge by year.

The data also comes in a format that is not easy to work with, having a column of rate and count for each year (i.e. rate of crime, count of crime). The years available are from 2010 to 2017.

The data from UNODC will be used to study

World Happiness:

The first World Happiness Report was released on April 1, 2012 as a foundational text for the U.N. High Level Meeting: Well-being and Happiness: Defining a New Economic Paradigm, drawing international attention. The report outlined the state of world happiness, causes of happiness and misery, and policy implications highlighted by case studies. In 2013, the second World Happiness Report was issued, and since then has been issued on an annual basis with the exception of 2014. The report primarily uses data from the Gallup World Poll.

The rankings of national happiness are based on a Cantril ladder survey. Nationally representative samples of respondents are asked to think of a ladder, with the best possible life for them being a 10, and the worst possible life being a 0. They are then asked to rate their own current lives on that 0 to 10 scale. The report correlates the results with various life factors.

Data is collected from people in over 150 countries. Each variable measured reveals a populated-weighted average score on a scale running from 0 to 10 that is tracked over time and compared against other countries. These variables currently include:

Each country is also compared against a hypothetical nation called Dystopia. Dystopia represents the lowest national averages for each key variable and is, along with residual error, used as a regression benchmark. The six metrics are used to explain the estimated extent to which each of these factors contribute to increasing life satisfaction when compared to the hypothetical nation of Dystopia, but they themselves do not have an impact on the total score reported for each country

There are several limitations to the data set. The data set used in this project only contains data from the years 2015 to 2020, despite the World Happiness Report starting in 2012. Additionally, although other variables such as real GDP per capita, social support, healthy life expectancy, etc. are measured, they are not used to scale or normalize the happiness score. This means that happiness score is purely based on how those in the samples ranked their current life on the Cantril ladder survey. For the purpose of this study, the data sets were all combined into one large happiness data set. The “United Arab Emirates” was removed from the 2018 World Happiness data set because its perception of corruption variable was a character (“N/A”), preventing it from being binded to the other data sets. These issues should also be kept in mind when drawing conclusions.

3 Exploration

3.1 Spotify

Naturally, there are types of music that go together. For example, it is unlikely that you would listen to a slow, acoustic song in a nightclub. Likewise, it is unlikely that “sad” songs are going to have lots of beat drops and high energy/electric music. All of these relationships can be explored with the Spotify data set and its many features. The pairs plot in Figure 3.1 explores these relationships and correlations.

Pairs plot for the variables in the Spotify dataset

Figure 3.1: Pairs plot for the variables in the Spotify dataset

Unfortunately, this process is very computationally taxing to run, so it was necessary to use the “spotify_by_year” data set instead of the larger “spotify” data set. The “spotify_by_year” data set takes averages for each year and does not include the explicitness variable. Therefore, this pairs plot neglects the explicitness variable and each point is a year, rather than an individual song.

The Shiny app below takes a closer look at each of the relationships in the above pairs plot (Figure 3.1). Each relationship can be plotted in a scatter plot both with and without a line.

Aside from the various scatter plots, it is interesting to see that a lot of the variables have bimodal distributions. Since the plots come from a long and versatile time frame (an entire century), this may be an effect of changing taste over time. Changes in music usually reflect society’s expressed mood (both in terms of production and consumption), so this is not a surprise. This is further supported by Figure 1.1, since the valence (or happiness of a song) values fluctuate over time. This is especially clear in the transition from the “roaring 20s” to the 40s and 50s. The detrimental effects of World War II can even be seen in music!

Shiny applications not supported in static R Markdown documents

3.2 Happiness

Before we begin combining the Spotify data set with other data sets, it’s important to look at those other data sets individually. For example, how does the happiness data set change over each year? How does happiness score vary across countries? These questions can be explored in Figure 3.2. Figure 3.2 is an interactive map that displays the change in world happiness of countries over the years from 2015 to 2020. Hovering over the countries reveals country name, happiness rank, and happiness score.

Figure 3.2: Map of World Happiness from 2015 - 2020

Figure ?? above shows a density plot of the happiness scores across different regions of the world. Note that regions are split differently in 2015 and 2016 than 2020, so the plots may not be directly comparable, as different countries may be included in different regions. Additionally, 2017, 2018, and 2019 do not split countries into regions, so the ridges plots arae not available for those years. The most noticeable trends are that Western Europe, North America, and Australia and New Zealand tend to have the highest happiness scores across all three years. Other noticeable features include the regions that have multiple modes, such as Sub-Saharan Africa and South Asia. This pattern shows that some regions may have countries that vastly differ in happiness scores, where countries tend to “cluster” around certain happiness scores. This could be indicative of certain countries doing poorly in terms of happiness while other countries are not.

World Happiness from 2015 - 2020

Figure 3.3: World Happiness from 2015 - 2020

Figure 3.3 above shows the mean happiness of the world over time. We can see that after 2017, the happiness score steadily increases. The range of happiness score is typically from 0 to 10, where 10 would be the happiest, so it’s important to note that while there is an overall increase in happiness after 2017, the average only goes up by about 0.125 points. It’s also necessary to note again that this data set only includes 2015 to 2020, so it’s difficult to tell if there’s an actual upward trend in happiness, or if it’s just some random “noise.”

3.3 Crime

Continuing with the exploration, it is time to look at the crime data. The crime data comes from the United Nations Office on Drugs and Crime and depends on reports from each country. This can create a discrepancy because each country could have different considerations and guidelines on crime, which will dictate if a specific incident enters their records or not.

By plotting each type of crime over the years and by the region in Figure 3.4 it is possible to see which types of crime are prevalent in which region, and there are also some notable changes over time. The changes over the years are present in some regions, but not in others. For example, Europe does not have much fluctuation in crime rates, whereas the Americas show a decline in Assault from 2010-2017. Additionally, Oceania has higher rates of Assault and Sexual Violence but lower rates of Robbery compared to other regions.

Rates of Crime by Year and Region

Figure 3.4: Rates of Crime by Year and Region

Looking at the crime data split by region allows for us to see that crime rates vary greatly across regions, and there are different trends within each specific type of crime. However, this idea also exists in specific countries of a given region. When viewed as a whole as in Figure 3.5 below, it’s apparent that globally, assault and robbery do not change much, but there is an increasing trend in sexual violence. This can be linked to various reasons: 1) sexual violence could be reported more than other crimes. 2) Changing laws and re-classifying certain types of assaults as sexual violence could lead to this increase. 3) Sexual violence could just be increasing around the world. In any way, there is a global increase in sexual violence crimes, according to the UNODC data.

Rates of Crime by Type and Year

Figure 3.5: Rates of Crime by Type and Year

Ways of reporting crime can change over time and vary by country. However, withstanding these limitations, it is still beneficial to look at what is available at hand. The Shiny app allows the viewing of country crime rates, filtering by crime type, region, and subregion.

Countries in similar geographic regions can be similar to a great degree. The U.S.A and Canada generally speak the same language and have much in common. The same can apply for Spanish-speaking countries in South America. However, each country can also be different in many ways, culturally, linguistically, and in terms of crime.

The Shiny app below illustrates this notion, since there are countries of high violence rates that are grouped with countries that have lower violence rates (see, for example, in the Shiny app with Assault, Americas, Northern America). Bermuda has a relatively much higher crime rate in comparison to Canada and the U.S.A., and this can drive the mean rate of crime for the Americas to go up.

Shiny applications not supported in static R Markdown documents

The Shiny app makes it possible to see trends in countries of certain regions and how they compare to each other. For example, it is clear that Hungary and Czech Republic have the highest rates of assault in the Eastern European regions. This type of observation can be made with a few other countries as well. Each region has a separate country that stands out.

Also, the lines through the points show that not all countries have data for every year, which can be a limitation. The methods used later will make use of the general trends and not data per country. The UNODC does not receive a response from every country in a given year, so it is not feasible to assume that there are not missing informations.

4 Comparing Datasets/Variables

4.1 Valence and Happiness

Valence scores from 2015-2020

Figure 4.1: Valence scores from 2015-2020

Figure 4.1 is a subsection of Figure 1.1, limited to the years 2015 to 2020, as these are the only years available in our happiness data. The above Figure 4.1 has a very similar trend to the happiness Figure 3.3: after 2017, valence also steadily increases. But the same limitations apply: the increase in valence from 2017 to 2020, despite continually trending upward, is only an increase of about 0.08. Additionally, this is a very small set of years in the whole data set, so it’s impossible to say whether valence is increasing again, or if this is just a small “hiccup” in the points.

The Spotify and happiness data sets are then combined in order to compare the valence and happiness.

Comparing Happiness Scores and Valence Scores over the Years 2015-2020

Figure 4.2: Comparing Happiness Scores and Valence Scores over the Years 2015-2020

In Figure 4.2 above, the music and happiness data sets were combined in order to compare valence and happiness score. In doing so, this means that each country now has one valence value, and one (mean) happiness score, because each point is a year. There does appear to be a trend here: as the valence score increases, the mean happiness score also increases. Again, note that the valence and happiness scores have very small value increases.

Figure 4.2 is a slightly inaccurate representation of, at the very least, happiness score because each country has different happiness scores. Because the happiness scores of each country is available each year, plotting that would be a better representation of happiness score instead of the mean happiness score.

Figure 4.3: Box plot of valence scores overlayed with points from the dataset

In Figure 4.3 above, each individual country’s happiness score was plotted against a valence value with boxplots overlayed on top so that each point now represents a country, instead of a year. The plot is interactive to display the country name, happiness score, and year of each point. Unfortunately, since the music data set does not have a country variable, each data point (each country) is still only associated with one valence value, so each valence value essentially represents a year. Therefore, valence was treated as a categorical variable in the above plot in order to be able to overlay the boxplots on top of the points, so the distance between the valence values are not proportionate. It’s visible to see in the last four boxplots that the median happiness score, as well as the 75th quantile, increases as the valence increases. The changes are minute, but in Figure 4.2, the changes are also very small, so they’re consistent.

Although it does appear that happiness score increases as valence increases in the last four boxplots, note that the last boxplot has a larger range in general. So while it looks as if happiness score is increasing, it could just be that the range of the score is increasing. Further investigation is required in order to fully compare these two variables.

4.2 Crime and Explicitness

In Figure 3.4, there was evidence that crime changed over time, even if there was only a slight change. Now, comes the question whether this change was also paired with the changes in explicit music over time. In order to start the comparison, the percent explicit music per year was calculated. This serves as a general indicator for how prevalent explicit music was in a year. Percent explicit music increases continually in the selected time frame from 2010 - 2017, as indicated by the darkening shade of the colors in Figure 4.5 and the plot below (Figure 4.4.

Percent changes in explicit and non-explicit music over time

Figure 4.4: Percent changes in explicit and non-explicit music over time

The reason for the unique spread of the data in the form of horizontal lines is due to the percent explicit scores. Each point on the plot is a country, and each country can have its own crime rate in a given year. However, every country in a given year would have the same percent explicit score since the music data is available on a whole and not on a country-by-country basis. For example, Burundi can have a crime rate different from Sudan in 2010, but they would both have the same percent explicit score.

Comparing rate of crime (for 100,000 population) and percent explicit music per year for 2010-2017, displayed by region for each type of crime: assault, robbery, and sexual violence

Figure 4.5: Comparing rate of crime (for 100,000 population) and percent explicit music per year for 2010-2017, displayed by region for each type of crime: assault, robbery, and sexual violence

There are a few outliers, so the Log Rate of Crimes was used instead of the Rate on its own on the x-axis. A logistic regression was used since Explicit is a binary variable, but also because we opted to use percent explicit music. The logistic regressions do not show a great amount of relationship between the changes in Rate of Crime and Explicit music. The strongest indicator is in Sexual Violence, which makes sense since this type of crime had the greatest increase. It appears that as the (log) rate of sexual violence went up, so does (percent) explicit music, whereas there is a negative trend between assault/robbery and percent explicit music.

5 Conclusions & Discussions

Starting with the Spotify data set and its various variables, it was possible to look at how music changed over time. Changes in music and happiness trends can be indicative of how the world is changing. The epochs associated with World War II had low valence (happiness) scores, and other epochs of relative peace and comfort had happier music.

Looking at the different trends of music, we thought that it may be interesting to compare these to changes in happiness and crime. Music has a strong impact on people, so it may be reflected in patterns in happiness or crime.

The happiness data set comes from the World Happiness Report and spans the years 2015 - 2020, which assigns a happiness score for each country from 0 - 10 and also ranks the countries in order from happiest to least happy. Many other variables are used, including but not limited to health, government, and major socio-economic conditions. This data set was compared with valence from the Spotify data set, for the years 2015 - 2020.

Over the different valence scores across the years, there did appear to be a positive relationship between happiness and valence score, but there does not seem to be a strong change in the median happiness score overall. This may require further study because of the limited amount of years, and it may also be better to relate this data by region or some other subset of the global data. The reason for this is because countries that may have experienced major increases in happiness can be masked by the constantly low-scoring countries in these reports, and vice versa. It is notable that the past three years have seen general increases in both valence and happiness scores.

The association between crime and explicit music resulted in a similar message. After exploring the crime data, it was possible to see how the three different crimes (assault, robbery, and sexual violence) changed over time. Sexual violence seems to be increasing over the years while the other types of crime, assault and robbery, are either in stagnation or slightly decreasing. Of course, these trends change depending on the region of the world but, on the whole, these are the general trends found in this report.

Explicit music, on the other hand, is definitely on the rise. We were interested in looking into whether this social expression of increased explicitness was connected to any rises in crime. The logistic regression model did not do much in proving this, because the two variables did not have a strong relationship. The only relationship was a positive one between sexual violence and percent explicit music. One takeaway is that sexual violence rates are on the rise, as well as explicit music. These two items, even if they are not connected strongly to each other, are both becoming more and more pervasive in our daily lives.

It may seem spurious to correlate music and crime. With observational studies like this one, it’s necessary to emphasize that correlation does not equal causation, but it is important to understand that music is a form of art and expression, and societal tendencies can be expressed through the art of its time. As artists and people reflect on what is going on around them, they will produce or consume different types of music that may resonate with them and the current state of society at the time.

We used these three data sets to explore any trends, comparisons, and visualizations that may have meaning and insight. Lots of our visualizations have insights on how each set changed over time, and how these changes are reflected between the data sets.